Skip to content

fix(streaming): normalize non-SSE upstream chat completions#412

Merged
SantiagoDePolonia merged 4 commits into
mainfrom
fix/connection-regression
Jun 18, 2026
Merged

fix(streaming): normalize non-SSE upstream chat completions#412
SantiagoDePolonia merged 4 commits into
mainfrom
fix/connection-regression

Conversation

@SantiagoDePolonia

@SantiagoDePolonia SantiagoDePolonia commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #411 — streaming chat completions where the connection appears to hang after the model has clearly finished, leaving the client "waiting for the model to continue outputting."

Root cause: Some OpenAI-compatible upstreams ignore stream:true and reply with a single buffered application/json completion (no data: framing, no [DONE]). The gateway forwarded that body verbatim under a text/event-stream content type, so SSE clients waited forever for an end-of-stream marker that never arrived.

Reproduced with bailian/deepseek-v3.2 (a dedicated Aliyun MaaS deepseek deployment that silently drops streaming):

HTTP/1.1 200 OK
Content-Type: text/event-stream                          ← gateway claims SSE
{"choices":[{"finish_reason":"stop","message":{...}}]}   ← but buffered JSON, no data:/[DONE]

Other models (qwen-flash, glm-5.1, deepseek-v4-*, OpenAI, Anthropic, Groq) stream normally — so this is a general robustness gap for any upstream that drops streaming, not a single-provider issue.

Not a version regression

The faulty path predates the versions named in the issue: CompatibleProvider.StreamChatCompletion (raw DoStream passthrough) is byte-identical at v0.1.40 and today, and the unconditional text/event-stream header has existed since the first commit. The v0.1.40..v0.1.42 diff touches only embeddings, health/readiness, dashboard, and storage — nothing in the chat-streaming path. The hang is triggered by the upstream model/endpoint behavior, not a gateway code change. This PR hardens the long-standing edge case.

Fix

New shared wrapper EnsureChatCompletionSSE, applied in CompatibleProvider.StreamChatCompletion so every OpenAI-compatible provider (bailian, groq, oracle, vllm, xiaomi, deepseek, openai) benefits. Following Postel's Law:

  • Genuine SSE (starts with data:, :, event:, …) passes through untouched, no buffering.
  • A buffered JSON completion (starts with {) is re-emitted as one SSE chunk (objectchat.completion.chunk, each choice's messagedelta) followed by a terminal data: [DONE].

Testing

  • New table-driven tests for EnsureChatCompletionSSE (buffered-JSON conversion, genuine-SSE passthrough, leading-comment passthrough, nil).
  • Manually verified with curl: deepseek-v3.2 now emits a proper chunk and [DONE]; previously-working providers still stream multi-chunk unchanged.
  • make test-race and make lint pass (pre-commit).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Chat completion streaming is now normalized to standard Server-Sent Events across providers, with consistent data: frames and a proper [DONE] terminator.
    • Improved behavior for non-SSE JSON responses and for upstream disconnects mid-stream, so already-received content is still delivered and the stream ends cleanly.
    • Added request validation in the XAI provider to return a clear error when the request is missing.
  • Tests
    • Expanded coverage for SSE pass-through (including comment-prefixed payloads), JSON-to-SSE conversion, and mid-body read error scenarios.

Some OpenAI-compatible upstreams ignore stream:true and reply with a
single buffered application/json completion (no data: framing, no
[DONE]). The gateway forwarded that body verbatim under a
text/event-stream content type, so SSE clients waited forever for an
end-of-stream marker that never arrived — the connection appeared to
hang after the model had clearly finished (issue #411).

Add EnsureChatCompletionSSE, applied in the shared
CompatibleProvider.StreamChatCompletion so every OpenAI-compatible
provider benefits. Genuine SSE streams pass through untouched with no
buffering; a buffered JSON completion is re-emitted as one SSE chunk
(object -> chat.completion.chunk, message -> delta) followed by a
terminal data: [DONE].

This is a long-standing latent defect, not a regression between the
versions named in the issue; the fix hardens the gateway against any
upstream that silently drops streaming.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 646778d2-cc5b-4fb2-98f8-4fd07d4404c2

📥 Commits

Reviewing files that changed from the base of the PR and between ee8ef7f and de658e3.

📒 Files selected for processing (2)
  • internal/providers/chat_stream_normalize.go
  • internal/providers/xai/xai.go

📝 Walkthrough

Walkthrough

Adds EnsureChatCompletionSSE, a new normalizer that inspects the first ~512 bytes of an upstream chat completion stream to determine whether it is buffered JSON or genuine SSE, converts buffered JSON to SSE format (rewriting the object field and messagedelta), and appends the terminal data: [DONE]. The function is wired into three provider implementations—CompatibleProvider, DeepSeek, and XAI—with explicit error checks before wrapping, and seven unit tests cover all branches including partial-body recovery on read failures.

Changes

SSE Stream Normalization

Layer / File(s) Summary
EnsureChatCompletionSSE and JSON-to-SSE conversion
internal/providers/chat_stream_normalize.go
New file adds EnsureChatCompletionSSE (peek-based JSON vs. SSE detection), bufferedCompletionToSSE (rewrites object, moves messagedelta, appends data: [DONE]), firstNonSpaceByte helper, and bufferedReadCloser wrapper.
Unit tests for all code paths
internal/providers/chat_stream_normalize_test.go
Defines errAfterReadCloser test helper and covers JSON-to-SSE rewriting with field transforms, byte-for-byte pass-through for real SSE, SSE-with-leading-comment pass-through, partial body preservation on read errors (ensuring [DONE] still appends), and nil-stream handling.
Provider StreamChatCompletion integration
internal/providers/openai/compatible_provider.go, internal/providers/deepseek/deepseek.go, internal/providers/xai/xai.go
Three providers now capture (stream, err) from p.client.DoStream, return errors without wrapping, and wrap successful streams with EnsureChatCompletionSSE before returning. Ensures all providers emit properly-terminated SSE regardless of upstream format.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Provider as Provider.StreamChatCompletion
    participant DoStream as p.client.DoStream
    participant Normalizer as EnsureChatCompletionSSE

    Client->>Provider: StreamChatCompletion(request)
    Provider->>DoStream: POST /chat/completions
    DoStream-->>Provider: (stream, err)
    alt err != nil
        Provider-->>Client: return nil, err
    else stream ok
        Provider->>Normalizer: wrap(stream)
        Normalizer->>Normalizer: peek first 512 bytes
        alt buffered JSON response
            Normalizer->>Normalizer: rewrite object to chat.completion.chunk
            Normalizer->>Normalizer: move message to delta fields
            Normalizer->>Normalizer: emit data: [DONE] terminator
        else genuine SSE stream
            Normalizer->>Normalizer: pass through via bufferedReadCloser
        end
        Normalizer-->>Provider: normalized SSE stream
        Provider-->>Client: normalized SSE stream
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A stream came in half-baked, no [DONE] in sight,
The rabbit peeked ahead and said, "this ain't right!"
It dressed up the JSON in data: clothes,
Appended the marker before the stream slows.
Now clients can rest when the model is through —
No hanging forever, hooray, it's brand new! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: normalizing non-SSE upstream chat completions by introducing SSE formatting where missing.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #411 by introducing EnsureChatCompletionSSE to normalize non-SSE responses and append the [DONE] marker that was causing the hang.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the streaming chat completion issue: new SSE normalization logic, integration into compatible providers, and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/connection-regression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Reviews (3): Last reviewed commit: "fix(streaming): avoid tainted size arith..." | Re-trigger Greptile

Comment thread internal/providers/chat_stream_normalize.go Outdated
Comment thread internal/providers/openai/compatible_provider.go
Comment thread internal/providers/chat_stream_normalize.go Outdated
Comment thread internal/providers/chat_stream_normalize.go Outdated
Comment thread internal/providers/chat_stream_normalize.go Fixed
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 77.19298% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/providers/chat_stream_normalize.go 79.06% 7 Missing and 2 partials ⚠️
internal/providers/deepseek/deepseek.go 50.00% 1 Missing and 1 partial ⚠️
internal/providers/xai/xai.go 66.66% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/providers/chat_stream_normalize.go`:
- Around line 40-44: The error handling path after io.ReadAll in the function
discards any partial data that was successfully read before the error occurred,
silently losing generated content. Instead of returning only chatDonePayload
when err is not nil, modify the error branch to combine the partial body content
that was read with the done payload, ensuring that any successfully generated
content is preserved and emitted before the DONE marker. This requires returning
both the partial data and the completion signal rather than discarding the
partial data entirely.
- Around line 33-35: The reader.Peek(peekForNonSSE) call blocks waiting for the
full 512 bytes (peekForNonSSE) to be available or EOF, which delays SSE streams
and makes streaming appear frozen. Instead of peeking the full peekForNonSSE
buffer size, use a smaller peek size that is sufficient to detect whether the
response starts with a JSON object (indicated by the opening brace character)
without blocking unnecessarily. Consider using a minimal peek size like 1 byte
or a small reasonable amount to check the first non-whitespace character,
allowing valid SSE streams to pass through immediately without waiting for the
full buffer to fill.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 854ae2ef-3f38-4502-b64e-9b0c100034b3

📥 Commits

Reviewing files that changed from the base of the PR and between 68cd937 and c999793.

📒 Files selected for processing (3)
  • internal/providers/chat_stream_normalize.go
  • internal/providers/chat_stream_normalize_test.go
  • internal/providers/openai/compatible_provider.go

Comment thread internal/providers/chat_stream_normalize.go Outdated
Comment thread internal/providers/chat_stream_normalize.go Outdated
SantiagoDePolonia and others added 2 commits June 18, 2026 12:42
Address PR review feedback on the chat-completions SSE normalizer:

- Classify the stream from its first non-whitespace byte via incremental
  Peek instead of Peek(512), so a genuine SSE upstream that emits a small
  first token then pauses is no longer held back waiting for a full buffer.
- Preserve partially-read bodies: when the upstream looked like buffered
  JSON but the read failed mid-body, forward what arrived before [DONE]
  instead of replacing it with a bare done marker.
- Apply EnsureChatCompletionSSE to the deepseek and xai providers, which
  stream via raw DoStream and previously bypassed the normalizer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CodeQL flagged the make() capacity computation (len(payload)+constants) as a
possible allocation-size overflow. Build the normalized SSE chunk with a
bytes.Buffer instead, which removes the explicit size arithmetic; behavior is
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/providers/chat_stream_normalize.go`:
- Around line 41-44: The condition at line 41 in the error handling block checks
both `err != nil && len(body) == 0`, but this is unreachable dead code. After
the Peek operation on line 34 successfully identifies the JSON opening brace
character, that byte is buffered and will be returned by the subsequent
io.ReadAll call, ensuring the body is never empty. Remove the `len(body) == 0`
check from the condition and keep only `if err != nil` to eliminate the dead
code branch and clarify the actual error handling path.

In `@internal/providers/xai/xai.go`:
- Around line 186-191: The StreamChatCompletion method in the Provider type
dereferences the req parameter without validating it is not nil, which will
cause a panic if a nil request is passed. Add a nil-check guard at the beginning
of the StreamChatCompletion method that returns an appropriate error if req is
nil, following the same pattern already implemented in the DeepSeek provider
referenced in this PR. This should occur before the req.WithStreaming() call on
line 190.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1fc2ef05-f172-41c1-af83-7e6f3c50deea

📥 Commits

Reviewing files that changed from the base of the PR and between c999793 and ee8ef7f.

📒 Files selected for processing (4)
  • internal/providers/chat_stream_normalize.go
  • internal/providers/chat_stream_normalize_test.go
  • internal/providers/deepseek/deepseek.go
  • internal/providers/xai/xai.go

Comment thread internal/providers/chat_stream_normalize.go Outdated
Comment thread internal/providers/xai/xai.go
Address follow-up PR review:

- xai StreamChatCompletion now guards a nil request before dereferencing it,
  matching the deepseek provider's contract in this PR.
- Remove the unreachable len(body)==0 branch in EnsureChatCompletionSSE: the
  '{' that classifies the body is already buffered, so io.ReadAll always
  returns at least that byte. bufferedCompletionToSSE already forwards partial
  bytes and appends [DONE], so behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SantiagoDePolonia SantiagoDePolonia merged commit 9025183 into main Jun 18, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.4.2 The conversation cannot be properly ended.

3 participants